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Abstract. The original European ESPRIT ProCoS I and II projects on Prov¬ 
ably Correct Systems took place around a quarter of a century ago. Since then 
the legacy of the initiative has spawned many researchers with careers in formal 
methods. One of the leaders on the ProCoS projects was Emst-Rtidiger Olderog. 
This paper charts the influence of the ProCoS projects and the subsequent ProCoS- 
WG Working Group, using Prof. Dr Olderog as an example. The community of 
researchers surrounding an initiative such as ProCoS is considered in the con¬ 
text of the social science concept of a Community of Practice (CoP) and the 
collaborations undertaken through coauthorship of and citations to publications. 
Consideration of citation metrics is also included. 


1 Background 

Historically, the creation of scientific knowledge has relied on collaborative efforts 
by successive generations through the centuries [37]. Scientific advances are gradu¬ 
ally developed by a community of researchers over time (e.g., the abstract algebra of 
the French mathematician Evariste Galois leading to Galois theory and group theory 
[8]). A scientific theory can be modelled as a mathematical graph of questions posed 
by scientists (represented by the vertices of the graph) and the corresponding answers 
(modelled by arcs connecting the vertices in the graph) [34]. The answers to questions 
lead to further questions and so the process continues, potentially ad infinitum. In gen¬ 
eral, mathematical logic underlies the valid reasoning that is required for worthwhile 
development of scientific theories and knowledge [17]. 

In recent years, the speed of transmission and the quantity of knowledge available 
has accelerated dramatically, especially with improvements in the Internet and specif¬ 
ically the increasing use of the World Wide Web [1]. Whereas previously academic 
papers were published on paper in journals, conference proceedings, technical reports, 
books, etc., now all these means of communication can and often are done largely elec¬ 
tronically online. The plethora of information has also become indexed more and more 
effectively, especially with the advent of the PageRank algorithm as used by Google 
[27]. 

In this paper, we use the European ProCoS (“Provably Correct Systems”) initiative 
of the 1990s [2,26] as an example of a foundational community of academic researchers 
working in various areas towards a common aim. We consider the related issue of the 



production of publications and their citations as an important aspect of scholarly ac¬ 
tivity. We model some aspects of this formally using the Z notation [5,35] to help in 
disambiguating some of the concepts that are often left somewhat nebulous in social 
science (e.g., with respect to a Community of Practice [38,39]). 

Section 2 introduces the European collaborative ProCoS projects and the subse¬ 
quent Working Group of the 1990s. In Section 3, we present an example ProCoS re¬ 
searcher and their relationship with other researchers through coauthorship and cita¬ 
tions, with visualizations of these relationships. The Section formalizes the relationship 
of researchers in an academic community such as that generated by ProCoS and extends 
this to cover a Community of Practice. Section 4 considers some of the citation metrics 
that are available for measuring a researcher’s influence, including their shortcomings, 
using publication corpuses that are now available online. Finally Section 5 provides a 
conclusion and some possible future directions. 


2 The ProCoS Community 

In this section, we consider the development of the ProCoS initiative and the commu¬ 
nity that it has created. The seeds of the ProCoS projects on “Provably Correct Systems” 
took place in the 1980s [2,26], coming out of the formal methods community [3,9]. The 
CLInc Verified Stack initiative of Computational Logic Inc. in the USA [28,40], using 
the Boyer-Moore Nqthm theorem proving to verify a linked set of hardware, kernel 
and software in a unified framework, was an inspiration for the initial ProCoS project. 
Whereas CLInc was a closely connected set of mechanically proved layers, ProCoS 
concentrated more on possible formal approaches to the issues of verifying a complete 
system at more levels from requirements, specification, design, and compilation, using 
a diverse set of partners around Europe with different backgrounds, expertise, and inter¬ 
ests, but with a common overall goal. A ProCoS “tower” with appropriate formalisms 
and approaches was proposed to investigate proving a system correct in a linked way 
at the various levels of abstraction. The approach was based around the Occam parallel 
programming language and Transputer microprocessor architecture. A gas burner was 
used as a motivating example for much of the work. 

The first ProCoS project was for 2^ years (1989-1991) with seven academic part¬ 
ners [2]. The subsequent ProCoS II project (1992-1995) involved a more focused set of 
four academic partners [12]. Subsequently a ProCoS-WG Working Group of 25 partners 
allowed a more diverse set of researchers to engage in the ProCoS approach, including 
industrial partners [13]. 

The ProCoS projects worked on various aspects of formal system development at 
different related levels of abstraction, including program compilation from an Occam- 
based programming language to a Transputer-based instruction set [7,20,26]. A gas 
burner was used extensively as a case study and this helped to inspire the development 
of Duration Calculus for succinctly formalized real-time requirements [41]. A novel 
provably correct compiling specification approach was also developed using a compil¬ 
ing relation for the various constructs in the language that could be proved using alge¬ 
braic laws [24]. This was later extended to a larger language including recursion [19, 
18]. The project used algebraic and operational semantics in its various approaches. 



The relationship between these and also denotational semantics was later demonstrated 
more universally in the Unified Theories of Programming (UTP) approach [23]. 

A Community of Practice (CoP) [38,39] is widely accepted social science approach 
used as a framework in the study of the community-based process of producing a par¬ 
ticular Body of Knowledge (BoK) [10]. An example of a CoP is that generated by the 
ProCoS initiative in the area of provably correct systems [7,20], as discussed later in 
this paper. The important elements of a CoP include a domain of common interest (e.g., 
provably correct systems), a community willing to engage with each other (e.g., mem¬ 
bers of the ProCoS projects and Working Group), and exploration of new knowledge to 
improve practice (e.g.. Duration Calculus [41] and later UTP [23]). 

3 A Community around a Researcher 

Here we use the German computer scientist and one of the original leaders on the Pro¬ 
CoS project, Emst-Riidiger Olderog [29-31] of the University of Oldenburg, as an 
example of a leading member of a community of researchers. The visualization ca¬ 
pabilities of the online Microsoft Academic Search facility (http: / /academic, 
research. microsof t. com) are used to illustrate this. For example, see Figure 1 
for E.-R. Olderog’s home page on Academic Search. The Academic Search facilities 
include graphical presentation of direct relationships of collaborators as coauthors of 
publications, direct citations of other researchers to an individual’s publications, and 
indirect connections between any two authors through intermediate coauthors. 

Academic Search also lists the coauthors, conferences and journals for each author 
in reverse order of publication count and the main keywords associated with the publi¬ 
cations of an author (see Figure 1). For example, three out of the top five coauthors of 
E.-R. Olderog were associated with the ProCoS project. In addition, he is particularly 
active in the International Colloquium on Automata, Languages, and Programming 
(ICAFP), the Integrated Formal Methods (IFM) conferences, and the Acta Informatica 
and Theoretical Computer Science journals (again, see Figure 1), Important keywords 
include “Duration Calculus”, a direct result (and unpredicted) of the ProCoS project. 

The links between coauthors and citing authors form mathematical graphs [II]. 
These can be modelled using relations. The Z notation [21,35] is a convenient notation 
to present these formally, as previously demonstrated in [6]. Here we concentrate on 
authors rather than publications and the paths of coauthors that connect researchers. 
In particular, we augment this model to consider the collaborative distance between 
an arbitrary pair of authors in terms of coauthorship. We model all the possible paths 
between such authors as a set of sequences of authors where the two authors under 
consideration are the first and last author in each of the sequences. The two authors 
also do not occur within these sequences and authors are not repeated in the sequences 
either. 

We use the concept of graphs in our mathematical modelling. A general graph can 
be modelled as a relation in Z, using a generic constant on any set X: 

graph : X ^ X 
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Fig. 1. Publication and citation statistics for Ernst-Riidiger Olderog on Academic Search. 


We can refine a general graph and consider a model for an undirected graph in Z; 


ugraph : P graph 


ugraph = ugraph'^ 
ugraph n id X = 0 


Here all nodes (authors) are connected in both directions (as coauthors) and also a node 
cannot be connected to itself (i.e., an author cannot be a coauthor with themselves). 
In the above definition, indicates the inverse of a relation and “id” produces the 
identity relation from a set. 

Academic communities consist of people that have authored publications. In Z, this 
can be modelled as a given set: 
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In an academic community of researchers for a particular area, there is often a main 
key researcher leading the field’s publications. Then there is a wider number of re¬ 
searchers that have published papers in the field. Typically published works have a 
number of coauthors. Published authors may be related to other authors transitively 
through coauthorship. Authors may also be cited by other published authors, even if 
not related through coauthorship. These relationships can be modelled formally using 
graphs: 

_ Researchers _ 

main : PEOPLE 
published : PEOPLE 

coauthors, related, eiting^authors : PEOPLE o PEOPLE 

main G published 

eoauthors C ugraph[published] 

related C ugraph[published] 

related = eoauthors^ 

citing-authors C graph[published] 


Note that “Fj^” indicates a finite non-empty set and indicated irreflexive transitive 
closure above. 

The Academic Search facility enables graphical visualization of the coauthors (e.g., 
see Figure 2) and citing authors (e.g., see Figure 3) for any particular author in its 
database. Figure 2 provides a pictorial view of a subset of the relation {author} <l 
related\> coauthors(\ {author} [) (where “o” indicates domain restriction of a relation, 
“[>” indicates range restriction of a relation, and “(| ... |)” indicates a relational image 
of a subset of the domain) for a specific author (in this case Ernst-Riidiger Olderog) at 
the centre. Connections between coauthors who have themselves written publications 
together can be shown as well, in addition to coauthorship with the main author under 
consideration. This results in groupings of coauthors that are interconnected in a way 
than can be seen visually very quickly. For example, in this case all the coauthors as¬ 
sociated with the ProCoS project are in the lower right-hand quadrant, including the 
author of this paper. 

Figure 3 gives a pictorial view of a subset of the relation { author} <\ citing-authors, 
again for a specific author located at the top left position in the diagram. Citations from 
authors involved with the ProCoS project are largely grouped on the left-hand side of 
the diagram, during Olderog’s early career. Later citations are to the right. 

Next we consider paths between pairs of nodes (authors): 

path : (A" X X) GG iseqX 

yXi,X 2 : X; s : iseqX | > 1 • 

{xi, X 2 ) I—>■ s € path 
head s = xi f\ 
last s = X 2 f\ 

(V n : Ni \ n < • (s n, s(suee n)) G graph) 
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Fig. 2. Primary coauthors of Ernst-Rudiger Olderog on Academic Search. 


The paths are modelled as injective sequences (“iseq”) of length more than one, where 
the first and last entries in the sequences are the two nodes under consideration and 
all adjacent pairs in the sequence are directly connected in the graph. Because the se¬ 
quences are injective, no nodes are repeated in these sequences. This means that the 
pair of nodes under consideration are always two different nodes. 


The collaborative distance of two authors can be of particular interest. Two authors 
may be connected in many different ways by sequences of coauthors or even in no 
way whatsoever (effectively an infinite collaborative distance). The shortest (minimum) 
connection between two different authors is of special interest. 
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Fig. 3. Primary citing authors for Emst-Riidiger Olderog on Academic Search. 
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In recent years, the “Erdos number” (i.e., the collaborative distance from Erdos) has 
become a metric for involvement in mathematical and even computer science research 
[11]. Paul Erdos, a very collaborative 20th century mathematician, is considered to have 
an Erdos number of 0. His direct coauthors (511 of them) have an Erdos number of 1. 
Other authors can be assigned a number that is the minimum length of the coauthorship 
path that links them with Erdos, assuming there is such a path. More generally, consid¬ 
ering a main author, the collaborative distance of other authors from the main author 
can be considered, or indeed between any arbitrary pair of published authors. Authors 
who have written publications with coauthors of Erdos (the main author) but not with 
Erdos himself have an Erdos number of 2. This process can be continued in an iterative 








manner, using a path of minimum length to determine the Erdos number when there is 
more than one path, as is typically the case for active researchers in the field. 
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Fig. 4. A selection of connections with Paul Erdos for Emst-Rudiger Olderog on Academic 
Search. 


Academic Search can provide a graphical view of a number of the shortest paths 
between any two coauthors, with Paul Erdos provided as the standard second author 
unless a different author is explicitly selected. Eigure 4 shows an example for Ernst- 
Riidiger Olderog. Here, five paths with a collaborative distance of four are shown. The 
five researchers on the right directly connected to Erdos have an Erdos number of 1. Of 
the five researchers directly connected to Olderog on the left, one (C. A. R. Hoare) was 
also on the ProCoS project. Of course the database of authors and publications may not 
be complete or accurate (e.g., especially for authors with common names) and there 
could be shorter paths between two authors in practice. 

The main author, as introduced earlier, could be considered as a coordinator of 
a Community of Practice [38]. Direct coauthors with the main coordinator take on a 
major editorial role in the CoP. Those that are related to the main author by transitive 
coauthorship are active members. These people form the core of the CoP membership. 



Those that cite any of the above are peripheral members of the CoP. Finally, other 
unrelated published authors are considered to be outsiders to the CoP, but are potential 
members. 


CoP _ 

Researchers 

editorial, active, core, peripheral, cop, outsiders : F PEOPLE 

editorial = coauthors(\ {main} [) 
active = related(\ {main} D \ editorial 
core = {main} U editorial U active 
peripheral = citing-authors (\ core [) \ core 
cop = core U peripheral 
outsiders = published \ cop 


In the context of the ProCoS example based on E.-R. Olderog as the main author. 
Those related by transitive authorship could be considered core members. The collab¬ 
orative distance could be limited to some set maximum if desired. Authors that have 
cited core ProCoS researchers are peripheral members of the ProCoS community. All 
other published researchers are considered outsiders to the community. Of course this 
formalization could be varied if desired, but it gives a precise definition for an informal 
social science concept of a CoP. 


4 Citation Metrics 


In the previous section we considered published authors and their communities of re¬ 
searchers. Here we consider individual authors and their publications. Nowadays there 
are various web-based facilities that index academic publications online, including fa¬ 
cilities that allow citation data to be calculated automatically. For example, Google has 
a specific search facility for indexing scholarly publications through Google Scholar 
(http: / / scholar . google . com). Books are also available online through Google 
Books (http: / /books . google . com), although this does not record citation in¬ 
formation. Google Scholar has very complete and up-to-date information compared to 
other sources [15], although this can be less reliable due to the lack of human checking. 
However, Google Scholar provides a facility for individuals to generated a personal¬ 
ized and publicly available web page presenting their own publications with citation 
information that can be hand-corrected by the author involved as needed at any time. 

The automated search through crawling of websites including publications with ref¬ 
erences that is undertaken by Google Scholar is fairly reliable for publications with a 
reasonable number of citations. The various citations allows automated improvement 
of the information. Typically for a given author on their personalized page, the publi¬ 
cations list includes a “long tail” of uncited or lesser cited publications, some of which 
can be spurious and with poor default information. These can be edited or deleted as 



required. In addition to valid publications, Google also trawls online programme com¬ 
mittee data for conferences. In these cases all the committee members are normally 
considered to be authors by Google Scholar. 

National governments and other institutions are increasingly keen on measuring re¬ 
search output by academics, with potentially significant implications on funding for 
universities. For example in the United Kingdom, the Research Excellence Framework 
(REF) in 2014 and next to be held in 2020 - formerly the Research Assessment Exercise 
(RAE) until 2008 - assesses all research-active academics that UK universities wish to 
return in various Units of Assessment (UoAs) covering all the standard academic disci¬ 
plines. There was a move to use Google Scholar for REF 2014 Sub-panel 11 (computer 
science), but in the end this we not possible for commercial reasons [33]. 

The UK REF exercise is used to allocate limited general research funding to uni¬ 
versities. Up to four “best” papers are selected by individual academics from the period 
in question (most recently six years) for assessment, normally assessed by peer review. 
These should be in highly rated journals present significant novel research ideally. The 
number of citations can be a very important factor in determining the quality of a spe¬ 
cific paper since it provides an indication of its influence in the field. Of course recent 
papers may not have had time to receive a significant number of citations, even if they 
later prove to be influential. A better indication could be obtained by considering cita¬ 
tions to papers in the previous assessment period, but this is not undertaken in the UK 
REF at least. 

There are various possible ways to measure the influence of a researcher through 
their publications. One of the simplest is the number of citations. This can vary widely 
between disciplines, and of course depends on the length of the career so far for a re¬ 
searcher, as well as patterns of collaboration with other researchers. Joint publications 
mean that a researcher can appear much more productive than if only single-author pub¬ 
lications are produced. Thus the sciences where multi-authored papers are the norm fair 
better for citation counts than the humanities where single-author books on research are 
more normal. However within a given discipline (e.g., computer science), comparison 
using citation metrics has some validity. 

The total number of citations can be deceptive for reasons dependent on the field. 
For researchers with a reasonable number of publications, there is a standard pattern 
to the distribution of citations for individual publications [14]. Normally a researcher 
has a small number of publications with significant numbers of citations (and thus in¬ 
fluence). Conversely there is typically a much larger number of publications with only 
a few citations (and hence much less influence). In practice the small number of highly 
cited publications are much more important in terms of influence than the larger num¬ 
ber of lesser-cited publications. Yet the total number of citations for the latter may be 
significant in size compared with the former. 

To overcome these issues, further citations metrics than just citation counts have 
been developed. One of the most popular is the h-index [22]. This measures the number 
h of publications by an individual author that have h or more citations. This provides a 
reasonably simple measure of the influence of an author through their most highly cited 
publications. All other lesser-cited publications have no influence on this metric. Google 



Scholar includes this metric on personal pages generated by individual researchers au¬ 
tomatically, 

The h-index can be formalized using the Z notation [5,35], for example. This was 
done in a functional style in an earlier paper [6]. Here we present a more relational and 
arguably more abstract definition. As in the previous paper, we use a Z “bag” (some¬ 
times also called a multiset) to model the citation count for each individual publication. 
We use a generic definition for flexibility. 

h-index : bag X —)• N 

V b : bag A"; h :N • h-index b = h h = : X \ b{x) > h} 


Note that Z bags are defined as bag X == X -i-> Ni, a partial function from any generic 
set X to non-zero natural numbers. X can be used to represent cited publications, for 
example, mapped to the number of citations associated with each of these publications. 
A publication with no citations will not be covered in this mapping. 

The h-index metric should be treated with some caution since comparison across 
different academic disciplines may not be valid due to differences in patterns of publi¬ 
cation. In humanities, single-author publications are the norm, as previously mentioned. 
In computer science, a small number of coauthors is typical (e.g., two to three on av¬ 
erage), with acknowledgements to others that have helped with the research in some 
smaller way. A supervisor may be named as second author to publication by a doctoral 
student, whereas in humanities the supervisor may well not be named. In chemistry, a 
larger number of coauthors is typical, with a team of people (e.g., ten or more) work¬ 
ing on a problem, providing different expertise. Indeed, coauthors may not have been 
involved in writing the paper at all, but may have given help with an experiment, for 
example. In physics, very large numbers of coauthors are possible for sizable and ex¬ 
pensive initiatives (perhaps even hundreds, e.g., experiments at CERN). 

On an individual author’s personalized Google Scholar page, as set up by and ed¬ 
itable the author, the number of citations for each publication and the total sum of 
citations together with the author’s h-index and also ilO-index (the number of publica¬ 
tions with 10 or more citations [6]), are displayed, for the last six years and for all time. 
A particular aspect that is lacking in Google Scholar is any significant visualization 
facility. The only visual output provided is in the form of bar charts of the number of 
citations each year for authors and also for individual papers. This is useful but not very 
impressive. 

As an alternative to Google Scholar, Microsoft Research’s Academic Search (see 
http: / / academic .re search, mi crosoft. com) provides another online data¬ 
base of academic publications. This was initiated at the Microsoft Beijing research lab¬ 
oratory in China. Unfortunately the resource is by no means as complete or up to date 
as the information provided by Google Scholar, although historical coverage of journals 
in the sciences is good. It appears that regular updates ceased in 2012. On the positive 
side. Academic Search does provide much better visualization facilities compared to 
Google Scholar, as illustrated in Section 3. It has also been possible for any individual 



to submit corrections regarding any publication entry within the database. These have 
been checked by a human before being accepted (after some variable delay). 

In addition to the h-index. Academic Search also provides the “g-index” [16] for 
each author. This is a refinement of the h-index and arguably provides a somewhat 
improved indication of an author’s academic influence. The g-index measure gives very 
highly cited publications (e.g., a significant book or foundational paper) more weight 
than with the h-index, where additional citations over and above the h-index itself for 
individual publications have no effect on its value. In the case of g-index, the most 
cited g papers must have at least g'^ citations in combination. Thus very highly cited 
publications do contribute additional weight to the g-index. Indeed, the value of the g- 
index is always at least as great as the h-index for a given author and is greater if there 
are some very highly cited publications. 

In [ 6 ], the g-index was formally defined in Z using a functional style, close to how its 
calculation could be implemented. Here we use a more relational style of specification, 
arguably more abstract and certainly less easily directly implemented in an imperative 
programming language: 

g-index : bag X —> N 
V b : bag X; g :N • 

g * g < max{a : bagX | o C & A #a = 9 • Sa} < (5 -f 1) * (3 -f 1) 


Note that the S function calculates the sum of all items in a bag and was defined for¬ 
mally in [ 6 ]. 

Other citation indices include the il0-index as used on Google Scholar, indicating 
the number of publications with ten or more citations [ 6 ] and the lesser used f-index 
[25], designed to be fairer in determining researchers with influence across more com¬ 
munities. With a plethora of citation indices, caution should be taken as to their relia¬ 
bility in practice. Encouraging the production of more papers with incremental results 
can be detrimental to the advancement of scientific knowledge [32]. 

5 Conclusion 

This paper has presented the collaborative European ESPRIT ProCoS projects and 
Working Group on Provably Correct Systems of the 1990s and the community that 
this formed. It considers the framework of a Community of Practice (CoP) in the con¬ 
text of collaboration and influence within such a community through coauthorship. We 
also have also considered citations to individual publications for a particular author. 
The development of knowledge depends on such communities of researchers, which 
are created and then transmogrify as needed, depending on the interests of individual 
researchers interacting in the larger community. 

A case study of an individual involved with the ProCoS project has been included 
with visualization of connections between researchers. Key concepts have been for¬ 
malized using the Z notation. Eurther formalizations and considerations of sociological 
issues within the CoP framework could be considered in more detail in the future. 



As well as communities of researchers, this paper has also discussed citation metrics 
for individual researchers, which have become increasingly widespread. It should be 
noted that the relevance of these, like most metrics, is a matter of debate and any such 
measurements should always be treated with caution and interpreted in an appropriate 
manner. In particular, the citations at any particular point in time are a snapshot with 
no precise indication of future citations. In addition, general concepts are often not 
cited as all. Many disciplines have a practice of including “passive” authors that have 
not directly undertaken the research, perhaps acting as a supervisor or funder instead. 
These and other issues mean that all citation statistics should be used with caution. 

Possible future directions include considering the graphs of relationships between 
authors and publications more holistically to model movements and influences, but this 
is beyond the scope of this paper. 
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